Goto

Collaborating Authors

 rademacher average


on all the raised issues, and how we will address them, which will certainly improve our work

Neural Information Processing Systems

We thank all the Reviewers for their feedback and their service to the community. We summarize it in the following paragraphs. In particular, taking localization to mean "analysis of the variance-constrained star-convex hull", function Instead taking localization to mean "constrain by raw-variance," the above issue is solved, but now the We intend to fully explore this connection in future work.


Learning Bounds for Risk-sensitive Learning

Neural Information Processing Systems

CV aR minimization algorithm to account for the covariate shift in the data-generating distribution. The advantage of risk-sensitive (either risk-seeking or risk-averse) objectives in machine learning, however, is not limited to tasks involving social considerations. Indeed, there exists a rich body of works which implicitly propose to minimize risk-sensitive measures of loss, as a technique to better optimize the standard expected loss.




Review for NeurIPS paper: Sharp uniform convergence bounds through empirical centralization

Neural Information Processing Systems

Weaknesses: The authors consider the supremum of absolute deviation \sup_f \hat{E}_x[f]-E_D[f] . In statistical learning theory, it suffices to estimate \sup_{f}[E_D[f]-\hat{E}_x[f]], i.e., there is no absolute value. Therefore, the results in this paper may not be interesting since centralization does not help for a smaller complexity that is sufficient for generalization. The centralization in terms of expectation has been considered in the literature for both Rademacher averages and variances. This paper extends this centralization to empirical centralization.


A Reliability Theory of Compromise Decisions for Large-Scale Stochastic Programs

Diao, Shuotao, Sen, Suvrajeet

arXiv.org Artificial Intelligence

Stochastic programming models can lead to very large-scale optimization problems for which it may be impossible to enumerate all possible scenarios. In such cases, one adopts a sampling-based solution methodology in which case the reliability of the resulting decisions may be suspect. For such instances, it is advisable to adopt methodologies that promote variance reduction. One such approach goes under a framework known as "compromise decision", which requires multiple replications of the solution procedure. This paper studies the reliability of stochastic programming solutions resulting from the "compromise decision" process. This process is characterized by minimizing an aggregation of objective function approximations across replications, presumably conducted in parallel. We refer to the post-parallel-processing problem as the problem of "compromise decision". We quantify the reliability of compromise decisions by estimating the expectation and variance of the "pessimistic distance" of sampled instances from the set of true optimal decisions. Such pessimistic distance is defined as an estimate of the largest possible distance of the solution of the sampled instance from the "true" optimal solution set. The Rademacher average of instances is used to bound the sample complexity of the compromise decision.


On U-processes and clustering performance

Neural Information Processing Systems

Many clustering techniques aim at optimizing empirical criteria that are of the form of a U-statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U-processes, for studying the performance of such clustering methods.


To Pool or Not To Pool: Analyzing the Regularizing Effects of Group-Fair Training on Shared Models

Cousins, Cyrus, Kumar, I. Elizabeth, Venkatasubramanian, Suresh

arXiv.org Artificial Intelligence

In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.


Two-Layer Generalization Analysis for Ranking Using Rademacher Average

Neural Information Processing Systems

This paper is concerned with the generalization analysis on learning to rank for information retrieval (IR). In IR, data are hierarchically organized, i.e., consisting of queries and documents per query. Previous generalization analysis for ranking, however, has not fully considered this structure, and cannot explain how the simultaneous change of query number and document number in the training data will affect the performance of algorithms. In this paper, we propose performing generalization analysis under the assumption of two-layer sampling, i.e., the i.i.d. Such a sampling can better describe the generation mechanism of real data, and the corresponding generalization analysis can better explain the real behaviors of learning to rank algorithms.


Sharper convergence bounds of Monte Carlo Rademacher Averages through Self-Bounding functions

Pellegrina, Leonardo

arXiv.org Machine Learning

We derive sharper probabilistic concentration bounds for the Monte Carlo Empirical Rademacher Averages (MCERA), which are proved through recent results on the concentration of self-bounding functions. Our novel bounds allow obtaining sharper bounds to (Local) Rademacher Averages. We also derive novel variance-aware bounds for the special case where only one vector of Rademacher random variables is used to compute the MCERA. Then, we leverage the framework of self-bounding functions to derive novel probabilistic bounds to the supremum deviations, that may be of independent interest.